Model Selection

High-precision image-text matching

# High-precision image-text matching

Vit SO400M 14 SigLIP2 378

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks

Spec-Vision-V1 is a lightweight, state-of-the-art open-source multimodal model designed for deep integration of visual and textual data, supporting a 128K context length.

Transformers Other

SVECTOR-CORPORATION

Vit L 14 CLIPA 336 Datacomp1b

CLIPA-v2 model, an efficient contrastive image-text model, focused on zero-shot image classification tasks.

Vit B 16 SigLIP

SigLIP (Sigmoid Loss for Language Image Pre-training) model trained on the WebLI dataset for zero-shot image classification tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase